The manual extraction of data from electronic documents, such as scanned images, is temporally and monetarily costly. Such inefficiencies can cause a backlog of hundreds of thousands of documents that any particular business or industry from which data must be extracted. Often, such electronic or scanned documents do not include a text layer. Thus, in a manual extraction process, a human must first identify the particular page or pages from the documents from which data is desired to be extracted. Such a process is time consuming and can be fraught with error as well. Further steps within the manual process are also time consuming and include, for example, separating the page or pages into a separate electronic document and correcting optical character recognition (OCR) errors where needed.